Data Mining at the Intersection of Psychology and Linguistics

نویسنده

  • R. Harald Baayen
چکیده

Large data resources play an increasingly important role in both linguistics and psycholinguistics. The first data resources used by both psychologists and linguists alike were word frequency lists such as Thorndike and Lorge (1944) and Kučera and Francis (1967). Although the Brown corpus on which the frequency counts of Kučera and Francis were based was very large for its time, comprising some one million word forms carefully sampled from different registers of English, many common words did not appear in the frequency lists, while others appeared with counterintuitive frequencies of use. Gernsbacher (1984) addressed this issue, claiming that subjective frequency estimates would be superior to objective frequency counts. Corpus-based frequency counts would be inherently unreliable due to regression towards the mean. In another corpus, higher frequency words would be less frequent, and lower frequency words would be more frequent. These considerations have led many psychologists to turn away from research directly addressing frequency effects in lexical processing. This distrust in psychology of corpus-based frequency data mirrors the rejection of corpora as a valid source of information about grammar in generative linguistics. Fortunately, more and larger corpora were developed, driven in part by the needs of commercial lexicography, in part by the research interests of corpus linguistics, and in part by the growing needs for reliable data in computational linguistics and linguistic engineering. These develop

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determination of optimal bandwidth in upscaling process of reservoir data using kernel function bandwidth

Upscaling based on the bandwidth of the kernel function is a flexible approach to upscale the data because the cells will be coarse-based on variability. The intensity of the coarsening of cells in this method can be controlled with bandwidth. In a smooth variability region, a large number of cells will be merged, and vice versa, they will remain fine with severe variability. Bandwidth variatio...

متن کامل

A Brief Survey of Text Mining

The enormous amount of information stored in unstructured texts cannot simply be used for further processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific (pre-)processing methods and algorithms are required in order to extract useful patterns. Text mining refers generally to the process of extracting interesting information and knowledg...

متن کامل

Morphological, Sedimentary and Hydrodynamic Study in Intersection of the Arvand River and the Karun River by Using Field Data and Numerical Modeling

Providing faultless proceeding of the engineer in order to protect the rivers, requires understanding the morphological behavior of the river and studying the hydrodynamic phenomena of the area. The intersection of the Karun rivers as the largest and longest river in Iran with the Arvand border river is of considerable importance due to its strategic location. In this paper, using field measure...

متن کامل

A new approach for assessing stability of rock slopes considering centroids of weak zones

The intersection lines between discontinuity surfaces and their intersection points on the visible surfaces of any engineering structure may be the instability indicators. This paper describes a new approach to modelling the intersecting lines and points that would provide the first evaluation of any instability in an engineering structure characterized by the failure modes. In this work, the i...

متن کامل

Morphological, Sedimentary and Hydrodynamic Study in Intersection of the Arvand River and the Karun River by Using Field Data and Numerical Modeling

Providing faultless proceeding of the engineer in order to protect the rivers, requires understanding the morphological behavior of the river and studying the hydrodynamic phenomena of the area. The intersection of the Karun rivers as the largest and longest river in Iran with the Arvand border river is of considerable importance due to its strategic location. In this paper, using field measure...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005